- 
                Notifications
    
You must be signed in to change notification settings  - Fork 25.6k
 
[ML] Support revoking inference default endpoint authorization #121326
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ML] Support revoking inference default endpoint authorization #121326
Conversation
| ClusterState state, | ||
| ActionListener<DeleteInferenceEndpointAction.Response> masterListener | ||
| ) { | ||
| if (modelRegistry.containsDefaultConfigId(request.getInferenceEndpointId())) { | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change will prevent REST calls from deleting default inference endpoints.
| })); | ||
| } | ||
| 
               | 
          ||
| // TODO should we add a lock on the default model id so we can't attempt to delete it while we're adding it? | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we add the default model ids to the lock map while they're being persisted?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm fine with leaving it as is? I'm thinking the worst case scenario is:
- Thread 1 starts persist
 - Thread 2 starts delete
 - Thread 2 finishes delete
 - Thread 1 finishes persist
 
But the next call will just redo the delete? And we won't get stuck in some sort of a delete -> put loop? Maybe? The only risk that I see is that the storeDefaultEndpoint is potentially called as part of _xpack/usage, so it is running frequently in some environments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah now that I think about it more I'm not even sure that's possible at least at the moment. The only way we would add a new default endpoint that has a possibility of being deleted would be when a node boots up for EIS. A single node wouldn't try to add the EIS default endpoints and delete them.
Two separate nodes could run into a situation though: Where node A gets the authorization response and begins persisting the default endpoints. Node B boots up but the authorization request fails for some reason, it would then attempt to delete the default endpoints.
It looks like we're not aborting conflicts though so I'd expect one of those calls to succeed and then when another node restarts it would remove the default endpoint eventually.
| DeleteByQueryRequest request = new DeleteByQueryRequest().setAbortOnVersionConflict(false); | ||
| request.indices(InferenceIndex.INDEX_PATTERN, InferenceSecretsIndex.INDEX_PATTERN); | ||
| request.setQuery(documentIdQuery(inferenceEntityId)); | ||
| request.setQuery(documentIdsQuery(inferenceEntityIds)); | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting side not, delete by query will not fail if it can't find an id. Which is helpful here because for the revoking of authorization situation, we don't really care if the model exists already or not, just that it is removed if it did exist.
| authorizationCompletedLatch.countDown(); | ||
| }); | ||
| 
               | 
          ||
| getServiceComponents().threadPool() | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is being called from the onResponse of an action listener. Which is probably already on a utility thread but just in case we'll kick off a new one.
| * @param waitTime the max time to wait | ||
| * @throws IllegalStateException if the wait time is exceeded or the call receives an {@link InterruptedException} | ||
| */ | ||
| public void waitForAuthorizationToComplete(TimeValue waitTime) { | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Making this public so I can access it in the integration tests that were created.
| 
           @elasticmachine merge upstream  | 
    
| 
           There are no new commits on the base branch.  | 
    
| 
           Pinging @elastic/ml-core (Team:ML)  | 
    
| @SuppressWarnings("unchecked") | ||
| var listener = (ActionListener<List<Model>>) invocation.getArguments()[0]; | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's another API that does the implicit typecasting magic, if you prefer:
ActionListener<List<Model>> listener = invocation.getArgument(0);
| })); | ||
| } | ||
| 
               | 
          ||
| // TODO should we add a lock on the default model id so we can't attempt to delete it while we're adding it? | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm fine with leaving it as is? I'm thinking the worst case scenario is:
- Thread 1 starts persist
 - Thread 2 starts delete
 - Thread 2 finishes delete
 - Thread 1 finishes persist
 
But the next call will just redo the delete? And we won't get stuck in some sort of a delete -> put loop? Maybe? The only risk that I see is that the storeDefaultEndpoint is potentially called as part of _xpack/usage, so it is running frequently in some environments.
…ic#121326) * Starting revoke * Adding integration tests * More integration tests * Adding test for deleting default inference endpoint via rest call * Removing task type any * Addressing feedback and adding test
…ic#121326) * Starting revoke * Adding integration tests * More integration tests * Adding test for deleting default inference endpoint via rest call * Removing task type any * Addressing feedback and adding test
…ic#121326) * Starting revoke * Adding integration tests * More integration tests * Adding test for deleting default inference endpoint via rest call * Removing task type any * Addressing feedback and adding test
…121326) (#121906) * [ML] Support revoking inference default endpoint authorization (#121326) * Starting revoke * Adding integration tests * More integration tests * Adding test for deleting default inference endpoint via rest call * Removing task type any * Addressing feedback and adding test * Fixing tests
…#121326) (#121907) * [ML] Support revoking inference default endpoint authorization (#121326) * Starting revoke * Adding integration tests * More integration tests * Adding test for deleting default inference endpoint via rest call * Removing task type any * Addressing feedback and adding test * Fixing tests
…121326) (#121908) * [ML] Support revoking inference default endpoint authorization (#121326) * Starting revoke * Adding integration tests * More integration tests * Adding test for deleting default inference endpoint via rest call * Removing task type any * Addressing feedback and adding test * Fixing tests
This PR implements the ability to revoke authorization for default endpoints. When a node boots up, it makes a call the EIS gateway. If the gateway returns a list of model ids that does not include one of the default endpoint models, we will revoke authorization. To revoke authorization, we instruct the
ModelRegistryto remove the default endpoints that were absent from the authorization response from its in memory cache. TheModelRegistrythen performs a delete-by-query to remove the inference endpoint ids.I also added a check in the deletion code path to prevent users from deleting default endpoint ids.